-
Notifications
You must be signed in to change notification settings - Fork 35
Pass data and country package versions to APIv2 #2501
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
anth-volk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this @nikhilwoodruff. TL;DR: there are a couple of breaking changes here, and I wonder if there isn't a better way of doing some of this, given expected changes as part of https://github.com/PolicyEngine/issues/issues/378.
|
|
||
| # Get the current dataset version | ||
|
|
||
| version_file = download_huggingface_dataset( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question, blocking: I thought we weren't using Hugging Face within the API. I'd expect this information to come from GCP.
Feel free to correct me if I'm wrong. My understanding was that we're keeping the HF code in the -data packages, -core, and any non-API packages.
| with open(version_file, "r") as f: | ||
| version = json.load(f).get("version") | ||
|
|
||
| data_versions = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment, blocking: I think there are some concerns with this approach
First, data_versions would imply that there are multiple keys and values stored, but I'm only seeing one here. Second, I didn't realize that dataset args always had versions embedded; I'd be concerned that this code might break down if and when we ever pass non-versioned values; if the filepath also contains / characters, I imagine we'd start lopping off filename metadata if the version weren't present.
I'd recommend a different approach, perhaps using the code being added as part of https://github.com/PolicyEngine/issues/issues/378 to pull the version number from GCP metadata. Since we're committing to versioning all dataset files at once in a given dataset repo, I think we could just pass a version value as a value associated with a data_version key in the logging output.
| dataset.split("/")[-1]: version | ||
| } | ||
|
|
||
| country_package_versions = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue: This duplicates some of what we already have with v1_country_package_version and v2_country_package_version. I'd recommend moving to one model_version key and implementing after merging the entire chain proposed in https://github.com/PolicyEngine/issues/issues/378.
| "baseline_policy_id": baseline_policy_id, | ||
| "time_period": time_period, | ||
| "dataset": dataset, | ||
| "package_versions": country_package_versions, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue, blocking: I believe this will fail a validation check for the V2V1Comparison schema, which isn't modified as part of this PR
|
Yes sorry this is out of date as per recent changes in the PRs to uk-data and .py and so will change. |
|
Closing, as we fixed via #2528. |
Fixes #2500
Requires PolicyEngine/policyengine.py#148